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Abstract 

Wc discuss the close connection between eigenvalue computation and optimization using 
the Newton method and subspace methods. From the connection we derive a new class of 
Newton updates. The new update formulation is similar to the well-known Jacobi-Davidson 
method. This similarity leads to simplified versions of the Jacobi-Davidson method and the 
inverse iteration generalized Davidson (IIGD) method. We prove that the projection sub- 
space augmented by the updating direction from each of these methods is able to include the 
Rayleigh quotient iteration (RQI) direction. Hence, the locally quadratic (cubic for normal 
matrices) convergence rate of the RQI method is retained and strengthened by the subspace 
methods. The theory is supported by extensive numerical results. Preconditioned formula- 
tions are also briefly discussed for large scale eigenvalue problems. 

Keywords: Eigenvalue; Rayleigh quotient; Subspace method; Jacobi-Davidson; IIGD; 
Newton method 

1 Introduction 

We study subspace methods for the standard eigenvalue problem 

Ax = Ax, (1) 

where A is an n by n matrix, A is the eigenvalue and x is the corresponding eigenvector. 
Eigenvalue problem (1) is central to many scientific applications. Several excellent monographs 
[2, 3, 5, 13, 27, 30, 40, 42] contain in-depth discussions on properties and algorithms for eigenvalue 
problems, together with many sources from science and engineering applications. 

Subspace methods include the Arnold! method [1], Lanczos method [20] and the more so- 
phisticated but important variants, the implicit restart Arnoldi method [35] and the rational 
Krylov method [29]. Other important subspace methods include the Davidson method [7], 
Jacobi-Davidson method [32, 33, 16], Krylov-Schur method [41], truncated RQ method [37, 45], 
and preconditioned subspace eigensolvers [4, 12, 14, 18, 19, 22, 38]. 
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of Energy, under contract W-31-109-ENG-38. 
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In this paper we study mainly the Jacobi-Davidson type subspace methods for (1). We 
highhght the connection between eigenvalue computation and optimization; the crucial link is 
the Newton method for Rayleigh quotient combined with subspace techniques. This link is well 
known and is central to the success of several eigensolvers, for example, the Davidson method 
and Jacobi-Davidson method. But often the connection to the Newton method is emphasized, 
which is used to account for the (locally) fast convergence property of the eigensolvers. The 
importance of subspace methods is no doubt well known. But we feel the following discussion 
may still contribute to the understanding of the Newton method applied within a subspace 
approach. We try to emphasize the subspace approach. 

We begin the discussion from optimization. The traditional optimization algorithms are 
often one dimensional; that is, at each iteration step, the optimization is done within a one 
dimensional space. As an example we look at the Newton method for 

min/(x), xgM", (2) 

X 

where / : M" ^ M is assumed to be twice continuously differentiable. At step k we obtain an 
approximate minimizer x^. Then we solve the Newton correction equation: 

[v2/(xfc)] 5xfc = -V/(xfc). (3) 

The approximate minimizer at step A: + 1 is updated from x^ and (5xfc as 

Xfc+i := Xfc + Ofc (5xfc, (4) 

where is some line search parameter obtained by an approximate minimization; = 1 is 
the standard Newton method. Often the Armijo-Goldstein condition or Wolfe condition [10] 
is used to ensure global convergence (without line search and other globalization techniques 
Newton method may converge to local maximal or saddle point). In other words, the minimizer 
is sought within a one dimensional space span{5ii.k} with affine translation x^. 

The key advantage of a subspace approach is that the convergence may be more robust and 
the convergence rate may be faster than that of a single vector update approach. This may be 
seen clearly from the following: Instead of looking for a minimizer within a one dimensional space 
5i, we look for a minimizer within a multidimensional subspace Sk that contains Si as its sub- 
space; hence the minimum in must be no greater than the minimum in Si. Again we use the 
Newton method for (2) as an example. At step k+\, instead of looking for a single vector update 
(4), we augment the subspace by the computed Newton direction (i.e., spanjxi, X2, x^, (5xfc}). 
Then we look for a minimizer x^+i within this augmented subspace. Then it is very clear to 
see that f{Si.k+i) ^ /(xfc+i). By augmenting the subspace in this way, we retain the locally fast 
convergence property of the Newton method. A second advantage of the subspace approach is 
for clustered eigenvalues. Since even for the symmetric case the gap theorem [27] tells us that 
a single vector approach (without deflation) will not lead to a meaningful eigenvector, we have 
to compute the invariant subspace related to the clustered eigenvalues. Another advantage of 
the subspace approach is that one can seek more than one optimizer by working with different 
subspaces. This feature is particularly important for eigenvalue computation because one often 
needs a portion of the spectrum and the related eigenvectors instead of a single eigenpair. 
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Clearly the advantage of subspace technique is not gained without price. At each iteration 
of a subspace method one has to do more matrix vector products. And since orthonormal 
basis is numerically more stable than non-orthonormal ones, one needs to ortlionormalize the 
new direction against the former subspace basis. These imply that the subspace dimension can 
not be large for large scale problems. To gain efficiency, one often needs to do restart and 
deflation [13, 30, 35, 40]. Restart is an indispensable technique which ensures that the starting 
vector becomes progressively better, so that the convergence becomes faster and the subspace 
dimension is kept relatively small to n. Deflation is also important with subspace methods. 
It tries to fix converged eigenvectors so that they will not be recomputed; and by using only 
vectors orthogonal to the converged eigenvectors to augment the projection basis, the projection 
subspace will be augmented in a more efficient way. 

The important link between the eigenvalue problem (1) and optimization is the following 
nonlinear function called the Rayleigh quotient: 

V* A "X" 

Q(x) = ^— , Vx^O. (5) 
x*x 

For symmetric or Hermitian A, miux Q{x) or maxx Q{x) gives the extreme eigenvalue of A. For 
general matrix A, the Rayleigh quotient defines the field of value [15, 17]: 

J-(A) := {Q(x) I X G C"\{0}} . 

If A is normal, then ^(A) is the convex hull of the eigenvalues of A. In general .?^(A) contains 
the spectrum of A. With suitable subspaces one can locate the eigenvalues of A via Rayleigh 
quotient; this is the well known Rayleigh-Ritz procedure. Certainly when the eigenvalues are 
located in the complex plane, the optimization interpretation becomes obscure, since one can 
not compare complex values. However, if we interpret optimizers not only as the end points of 
an interval in M but also as the boundary of a region in C, then the optimization interpretation 
still makes sense. As an example of this interpretation we look at the Lanczos- or Arnoldi-type 
algorithms. It is well known that these algorithms usually approximate the exterior eigenvalues. 
Since the subspace at each iteration essentially contains the residual = Ax^ — A^x^, for 
symmetric problems is in the direction of the gradient of (5) at x^ (i.e., the steepest descent 
or ascent direction). For nonsymmetric problems may not be readily considered as gradient 
since <5(x) may not be differentiable [26], but it is likely that the subspace augmented by x^ 
contains the steepest descent or ascent direction (measured by vector norm) of (5(x). This can 
also be considered as the advantage of the subspace approach: one does not have to compute the 
exact direction directly, but the span of the vectors contains the best direction. In this vein of 
thought, there would be no surprise that the algorithms converge to the exterior eigenvalues — 
the optimizers in the complex plane. As for interior eigenvalues, the common approach is to 
apply shift and invert, that is, to map the interior eigenvalues of the original matrix into extreme 
eigenvalues of a transformed matrix, and then apply standard solvers to the transformed matrix. 

Some notations: Throughout this paper we use boldface letters for matrices and vectors. 
The upper script ()* denotes the transpose ()"^ in the real case, and the conjugate transpose ()^ 
in the complex case. 
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2 Subspace Eigensolvers: Davidson and Jacobi-Davidson 



The Davidson method and Jacobi-Davidson method are two of the typical examples of subspace 
methods for eigenvalue computation. Before getting into details we present the framework of 
subspace methods for the eigenvalue problem (1) in Algorithm 2.1, where for simplicity we 
assume that A = A* and that only the smallest eigenvalue and its corresponding eigenvector 
are sought. Note that the optimization within a subspace is done at step 5(e). 



X 



Algorithm 2.1 Framework of subspace methods for the eigenvalue problem (1): 

1. Start with an initial unit vector x, Vi 

2. Compute w = Ax, Wi ^ [w]. 

3. Compute A = Hi = x*w, r = w — Ax. 
4- If ||r|| <= e, return (A,x) as the eigenpair. 
5. Outer Loop: for j = 1, 2, m do 



(a) Call specific subspace method to construct an augmentation vector t. 



(d) Compute 



Hj h 

h* Q 



(b) Orthonormalize t against Vj to get a unit vector v. 

(c) Compute w = Av, Wj+i <— [Wj | w], Vj+i ^ [Vj | v]. 

h 

a 

(e) Compute the smallest eigenpair (A,y) o/Hj+i. (||y|| = 1). 

(f) Compute the Ritz vector x = Vj+iy 

and the residual vector r = Ax — Ax = Wj+iy — Ax. 

(g) Test for convergence: if <= e, return (A,x) as the eigenpair. 
6. Restart: Set Vi = [x], Wi = [W^+iy], Hi = [A], goto step 5. 



Note also that the stopping criterion ||r|| <= e is in the relative sense; i.e., e is related to the 
computed Ritz values. This is more important to the nonnormal problems. For A ^ A*, one 
can easily adapt the framework by modifying step 5(d) as 



H 



H 



v*w 



note only the h* term at step 5(d) needs to be changed into v*Wj. If more eigenpairs are 
required, one can compute eigenpairs of Hj+i according to different selection criteria at step 
5(e). The selection criteria include the first Z > 1) eigenvalues of the largest or smallest real 
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parts, imaginary parts or magnitudes. Then at step 5(a) one computes the augmentation vectors 
from the residual vectors related to the selected eigenpairs of H^+i. 

The other important numerical step is the orthogonalization at step 5(b). A common choice 
is the modified Gram-Schmidt (MGS) method. MGS is in practice much more stable than Gram- 
Schmidt (GS), but the orthogonality of the basis constructed by MGS depends on the condition 
number of the original set of vectors. Another problem is that MGS can not be expressed by 
Level-2 BLAS, hence parallel implementation needs more communication [2, 11]. Another choice 
for orthogonalization is the DGKS correction [6]. DGKS can be implemented more efficiently in 
parallel because DGKS is actually GS with selective reorthogonalization. To make t orthogonal 
to Vj we do the following, 

itmp ^ "^i*' ^ "t — 'Vjttmp, Vj+i ^ |j— 1|, (6) 

if (||ttmp|| > tol * llvj+ill) then do the reorthogonalization : 

Ump ^ ^j^j+l'i ^ Vj+i — Yjttmp, ^ 11 "'"'"^ii ; (7) 

II 

where the criteria means that if the angle between ttmp and Vj+i at line (6) is smaller than 
actan{tol), then reorthogonalization is necessary. 

The subspace dimension and the convergence rate are a big concern for large scale problems. 
For subspace eigensolvers the convergence rate often depends heavily on the quality of the 
starting vector. Restart can be used to refine the starting vector. We know that before restart, 
useful information may have been accumulated in the recent expansion vectors. Thus it may 
not be desirable to keep only the most recent expansion vector and throw away all the rest. In 
[39, 44] thick restart techniques are proposed. The idea is to keep as much useful information as 
possible by keeping more than one expansion vector at the first step of restart. This idea has 
proved to be efficient and easy to incorporate into the framework. 

What distinguishes each subspace method is the vector augmentation at step 5(a). When 
iterative methods (e.g., GMRES [31], BICGSTAB [46]) are used to solve for the augmentation 
vector, step 5(a) necessarily contains the inner iteration of the subspace methods. 

The Davidson method [4, 7, 22, 30] constructs t as follows: 



5(a) Construct preconditioner M^-; solve for t from: 

(Mj - AI) t = -r. 



In the original paper by Davidson [7], Mj = diag{A). Originally Davidson derived this formula- 
tion by taking the derivative of Q{xj), varying only the i-th. component of Xj (denoted as x^^j)) 
each time: 



5Xj(i) 



= 0, i = l,2, ...,n. (8) 

From (8) we get (^Xj(j) = (Aj — aa)"^ {Axj — AjXj)(j). In [8] Davidson related this formulation to 
the Newton method. It was observed that the convergence rate is related to how well the diag- 
onal matrix diag{A) approximates A. For example, if A is diagonally dominant (in eigenvalue 
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problems, diagonal dominance means that the off-diagonal elements are small compared with 
the changes in magnitude between diagonal elements [42, 22, 38]), then the Davidson method 
converges very fast. Hence the diagonal matrix (diag{A) — Ajl) was considered as a straightfor- 
ward preconditioner to A — Ajl. The main advantage of a diagonal preconditioner is that the 
preconditioned system (which is diagonal) is trivial to solve, but it requires A to be diagonally 
dominant. More sophisticated preconditioners than the diagonal matrix have been tried [4, 22], 
leading to the so called generalized Davidson method. As has been pointed out (e.g., in [32]), 
the preconditioner interpretation does not explain the improved convergence rate well since the 
exact preconditioner (A — Ajl)~^ leads to stagnation ((A — Xjl)~^r is x-,-, which lies in the 
original projection subspace). 

The Jacobi-Davidson method [32] is an important advance for subspace methods in eigen- 
value computation. At each iteration the correction vector t is required to reside within the 
orthogonal complement of the existing projection subspace. Jacobi-Davidson solves the follow- 
ing projected equation: 



5(a) Solve (approximately) for 


t s.t. t _L X and 




(I -XX*) (A 


- AI)(I-xx*)t = -r. 


(9) 



Usually (9) is solved inexactly by iterative methods. The matrix (I — xx*)(A — AI)(I — xx*) 
is always singular, but this singularity poses no essential difficulty to iterative methods for (9). 

Existing preconditioners for linear systems may be applied when solving (9) iteratively. 
This is regarded as one of the major advantages of (Jacobi-)Davidson-type subspace methods, 
since preconditioners may not be easily incorporated into the Arnoldi-type methods. Rational 
Krylov-type or (single vector) RQI-type methods often require rather exact system solves, while 
(Jacobi-)Davidson-type methods allow approximate system solves. 

An equivalent formulation to (9) is the inverse iteration generalized Davidson (IIGD) method 
by Olsen et al [24]. IIGD solves 

(A - AI)t = -r + XT, (10) 

where r is set to ensure t _L x, i.e., r = x*(A-Aij-^x ' -^^o™ (10) it is clear that the expansion 
vector t contains information in both directions (A — AI)~^r and (A — AI)~'^x. In the original 
paper [24] the authors dealt with a huge problem at that time and they could not afford the 
subspace approach; hence they ended up using single vector updating approach in their program 
(this is also pointed out in [41]). 

In [34] and [24] it was argued that Jacobi-Davidson and IIGD are (inexact) Newton-Raphson 
methods. The generalized Rayleigh quotient (5(x, y) := ^y^t^ was used in [34] to establish the 
argument. In the following section we derived a Newton update straightforwardly from the 
standard Rayleigh quotient (5). 

3 A New Class of Newton Update 

The locally fast convergence property of the Newton method in optimization is an attractive 
feature for designing fast algorithms. Davidson, Jacobi-Davidson and IIGD methods all tried 
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to establish a connection to Newton method to explain their fast convergent behavior. Yet no 
Newton method combined with subspace projection applied directly to the Rayleigh quotient 
(5) seems to have been tried. In this section we establish this interesting application. 

We assume that A* = A. (This is not for notational simplicity reason, since the nonnormal 
case may be completely different.) The gradient of the Rayleigh quotient (5) is 



VQ(x) 



2 Ax 2x*Axx 



x*x (x*x)2 

^ (Ax-Q(x)x). (11) 



x'x 



The Hessian of (5) is 

9 Ax 4 A XX* 2 4 

V^Q(x) = ^ - - ^ (x(VQ(x)*) + Q(x)I) + ^ (g(x)xx*) 

This seemingly complicated formula can be simplified. If we assume x is already normalized 
and denote A = Q{x), then 

V2Q(x) = 2(A - AI) - 4(Axx* + xx*A* - 2Axx*). (13) 

So the Newton equation V^(5(x)t = —VQ{x) is 

[(A - AI) - 2(Axx* + XX* A* - 2Axx*)]t = -(Ax - Ax) = -r. (14) 

To see the relation with the Jacobi-Davidson equation (9), we expand the matrix in the left 
hand side of (9) and get: (note x*x = 1) 

[(I-xx*)(A- AI)(I-xx*)]t 
= [(A-AI)-(Axx* + xx*A-2Axx*)]t = -r. (15) 

From (14) and (15) we see the essential difference is that the scaling of the correction term 
(Axx* + XX* A* - 2Axx*) to (A - AI) in (14) is twice as much as the one in (15). The formula 
we derived here is Newton method, hence it preserves the locally quadratic convergence rate. 
No additional treatment is needed to establish the connection to the Newton method. 

Our new algorithm using Newton direction within the subspace method framework (Algo- 
rithm 2.1) may be stated as follows: 



5(a) Solve (approximately) the following equation for t _L x: 

( A - AI - 2(Axx* + XX* A* - 2Axx*) )t = -r (16) 
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In nonlinear optimization, the Hessian matrix is often difficult to obtain, hence low rank 
updates such as DFP and BFGS are used to obtain approximate Hessian matrix, resulting in 
the Quasi-Newton methods with a super linear convergence rate. For eigenvalue computation, 
the situation is better because we can compute the explicit Hessian of the Rayleigh quotient. 
Actually in [43] the correct Hessian was derived, but the formula in [43] was not simplified and 
it was more complicated than (12); hence the authors did not try applying Newton method 
to the Rayleigh Quotient, while they proposed other very neat Newton-type methods based on 
transformed formulations of the eigenvalue problem. 

The other difference between (14) and the Jacobi-Davidson equation (15) is that the xx*A* 
term in (16) is xx*A in (15). This certainly makes no difference under the assumption A = A*. 
Actually under this assumption, equation (16) is equivalent to: 

(I-2xx*)(A- AI)(I-2xx*)t = -r. (17) 

This looks very similar to the Jacobi-Davidson equation (9). We note that (15) was derived from 
computational insight (where the idea of the Jacobi orthogonal component correction is exploited 
as the orthogonal projector). We will discuss later that this is the truly valuable insight that 
makes the Jacobi-Davidson to be so efficient. While (17) is derived rigorously by the Newton 
method. The major difference is that (I — 2xx*) is perfectly conditioned since (I — 2xx*)^ = I 
(I — 2xx* is also called the Householder reflector), while the orthogonal projector (I — xx*) is 
singular. Hence, away from an eigenvalue, (17) is better conditioned than (15). When near 
an eigenvalue A, both (15) and (17) relate to the inverse iteration on (A — AI). As discussed 
in [27, 28], the ill-conditioning of (A — AI) does not prevent computing the direction of the 
eigenvector for A correctly. 

For large scale problems, computing the Newton direction exactly is usually not practical. 
Equation (17) need to be replaced by some preconditioned equation. We can incorporate pre- 
conditioners in the same way as the generalized Davidson method or Jacobi-Davidson method 
does. I.e., if we construct a good preconditioner Mj at step j, then we solve the following: 

5(a) Applying preconditioner Mj, solve the following equation for t _L x: 

(I - 2xx*)(Mj - AI)(I - 2xx*)t = -r. (18) 

If the formulation in (16) is preferred, then we solve for t ± x from 

( Mj -XI- 2(Axx* + XX* A* - 2Axx*) )t = -r. (19) 

In both formulations (18) and (19), we may apply the easily available diagonal preconditioner 
Mj = diag{A). We note that they now can not be solved as trivially as diagonal systems. We 
also note that (19) contains more information about the original A than does (18). 

The overhead caused by the correction terms in (18) or (19) is marginal for iterative methods 
because one mainly computes the approximate solution by matrix- vector products. But we point 
out that the condition number of the matrix (which need not be formed explicitly) affects the 
convergence rate of any iterative method. For eigensolvers that utilize preconditioners from 
linear systems directly, another complication may occur, as pointed out in [36], that a good 
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preconditioner Mj for A (such as preconditioners from multigrid method) need not necessary 
be a good preconditioner for A — AI or other modifications to A. We also note that the 
interpretation of conditioning is different between hnear systems and eigenvalue problems: for 
linear system, the conditioning depends solely on amax/o'min where a denotes the singular value; 
while for eigenvalue problems, each eigenvalue has its own conditioner number and the separation 
of eigenvalues is closely related to the convergence rate of eigensolvers. 

Note that r = Ax — Ax is computed at each step of the Davidson-type methods. If A G M, 
the three correction terms in (16) and (19) may be simplified into two terms: 

Axx* + XX* A* - 2Axx* = rx* + xr*. (20) 

This simplification makes updating the matrix vector products at each iteration easier, and we 
use it in our code for the numerical tests. 

4 Unification of Jacobi-Davidson, IIGD and Newton Updates 
via RQI 

In large scale eigenvalue computation, solving preconditioned system (inexactly) via iterative 
methods is a crucial part for most of the fast eigensolvers. We see from Section 3 that when 
suitable correction equations are found, preconditioners may be readily adapted into the existing 
subspace methods by modifying the correction equations. Hence in the rest of this paper we 
concentrate on the derivation of the correction equations, without being distracted by precon- 
ditioners. 

When we talk about fast convergent algorithms for eigenvalue problems, an important can- 
didate that comes to mind is the Rayleigh quotient iteration (RQI). RQI is cubic convergent 
for normal matrices and quadratic convergent for nonnormal matrices [26] . Extensive studies of 
RQ and RQI exist. Properties of RQI and different type of generalizations may be found, for 
example, in [9, 23, 25, 26, 27, 40] and references therein. 

In the following we unify the Jacobi-Davidson, IIGD and the proposed methods in Section 3 
by RQI. This unification leads us to propose a simplified version of the Jacobi-Davidson method 
and a simplified IIGD method. We first establish the following theorem. 

Theorem 4.1 Let the initial unit vector be the same x, assuming the Rayleigh quotient A = 
x*Ax is not an eigenvalue of A yet. Then the subspace method with equation (9) or (10) or 
(17) each produces a subspace that includes the RQI direction during the next iteration. 

Proof: The subspace method with updating equation (9) or (10) or (17) is a special case of 
the following (with suitable a and (3): 

5(a) Solve (approximately) the following equation for t _L x: 

(I-axx*)(A-AI)(I-;3xx*)t = -r, V a / 0. (21) 

We now prove the following stronger statement: Solving (21) for t ± x, then the subspace 
augmented by t will include the RQI direction during the next iteration. 
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It is well known that during the next iteration, RQI produces the direction 

XR = (A-AI)-ix. 
Under the restriction x*t = 0, equation (21) becomes 

(A - AI)t = -r + ry X. 
This is actually the IIGD equation. From (22) 

t = (A - AI)~^r + ?7 (A - AI)"^x = -x + r/ x^j, 



(22) 



(23) 



where i] = l/(x*Xij) is chosen to ensure x*t = 0. Note that x is in the original projection 
subspace, say, Vj. We see that the RQI direction will be included in the augmented projection 
subspace Vj+i when we carry out 5(b) in Algorithm 2.1. ■ 

This theorem shows that the subspace method with (21) is able to retain the desirable fast 
convergence property of RQI. As discussed in the introduction section, the subspace approach 
possibly converges faster than RQI, and it also possesses the other advantages of a subspace 
method over a single- vector method. 

Formula (21) may be linked to the Newton method in other ways. From (21) and x*t = 
we see 

(I-axx*)(A- AI)t 



-r. 



This leads to: 

(A - AI)t - X = -r, x*t = 0. 
Equation (24) is equivalent to the following bordered equation: 



(24) 



(25) 



This bordered equation also appears in [28, 43] where Newton method is applied to the following 
nonlinear equation: 

(A-AI)x = 0, 



A - AI -X ■ 




" t 




— r 


-X* 












-lx*x+l = 0. 



(26) 



The bordered matrix turns out to be the Jacobian matrix of (26). The Newton correction 
equation to (26) is exactly (25). From this point of view, each of the algorithms — Jacobi- 
Davidson, IIGD and the generalized form (21) — can be viewed as a Newton method for (26) 
carried out with subspace acceleration. A hidden reason for the success of the Jacobi-Davidson 

and IIGD methods may be that the bordered matrix ^ ^ is generally nonsingular 

(even when A is a simple eigenvalue of A). The nonsingularity was first proved in [28]. In actual 
computations one does not need to form the bordered matrix. Discussion of the preconditioned 
form of (25) may be found in [36]. 

We prefer unifying the methods by RQI instead of by the Newton method, since the con- 
vergence rate for RQI is locally cubic for normal matrices and locally quadratic for nonnormal 
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matrices. Whereas the Newton method usually explains only the locally quadratic convergence 
rate for both cases. 

As seen from the proof, the fast convergence rate of these methods depends mainly on 
including the RQI direction to augment the projection subspace. This fact is well known for the 
Jacobi-Davidson method. But to cast it in a more general setting as we did here is new. Also, 
the proof leads to other interesting observations, which we discuss in the next section. 

5 Simplification of Jacobi-Davidson and IIGD 

A more critical look at the proof of Theorem 4.1 reveals that the constants before the correction 
terms in (9) and (17) (hence also (16)) surprisingly seem not essential in establishing the link to 
RQI. As seen from (21), any nonzero a seems to work. The arbitrary choice of the parameters 
(a, /3) in (21) may be convenient for algorithm design, but certainly it is not mathematically 
pleasant. 

The difference may be that from (21) to (22) we require both x*t = and implicitly r] = 
a x*(A — AI)t = l/(x*x/j). This implies that different choices of a (and possibly /3) may not 
be equivalent. But this explanation does not go further. 

We have determined, through extensive numerical experiments, that a = 1 in (21) is the 
most essential character that leads to the success of the Jacobi-Davidson method. In the Jacobi- 
Davidson method [32], a = /3 = 1 is inspired by the Jacobi orthogonal component correction. 
This construction is indeed of deep computational insight. While in (17), a = /? = 2 is derived 
from the Newton method for the Rayleigh quotient of A(= A*). Even though (I — 2xx*) is a 
perfectly conditioned matrix, without the additional restriction x*t = 0, (16) and (17) generally 
converge more slowly than do the Jacobi-Davidson and IIGD. At first sight this may seem 
questionable because from Theorem 4.1 all the methods should be equivalent. The explanation 
is that multiplying (A — AI)t = — r by perfectly conditioned matrices may not move the solution 
t away from the direction of x; hence the stagnation for the perfectly preconditioned Davidson 
method may also happen to (16) and (17), though not as seriously. Additionally requiring 
t ± x, if done after solving the correction equations (16) or (17) by direct methods, may lead 
to a poor quality t. In this case, often the DGKS method at step 5(b) is not enough to ensure 
orthogonality. One would have to perform three steps of Gram-Schmidt orthogonalization, 
which often results in a vector that does not augment the projection subspace efficiently. If the 
t _L X is enforced at each step of any iterative methods for (16) or (17), then from the equality 
(I — xx*)(I - 7XX*) = (I — 7xx*)(I — XX*) = (I - XX*), for V7, we see that (16) and (17) are 
essentially Jacobi-Davidson. 

As seen from Algorithm 2.1, after solving the correction equation for t in step 5(a), this t 
is orthonormalized against a subspace that contains x. Hence, theoretically speaking, requiring 
t _L X at step 5(a) is not necessary; but for numerical reasons, especially when iterative methods 
are used to solve the correction equation, it may be necessary to orthogonalize the solution t 
against x once in a while or at every step of the inner iteration (as suggested in [32] ) . 

We found out that choosing a = 1 in (21) is crucial in moving t sufficiently away from the 
direction of x. While the value of /3 does not appear to be important, it may be set to any value 
not too large as to lead to overflow. For cost-effective reasons, we may set (3 = and simplify 
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the Jacobi-Davidson method as follows: 



5(a) Solve (approximately) for t s.t. t _L x and: 

(I-xx*)(A- AI) t = -r. (27) 

Note that (27) implies 

(I - xx*)(A - AI) t = -(I - xx*)r, 

so we actually solve the equation from a right hand side vector orthogonal to x. 

For A 7^ A*, the unsymmetric matrix in (27) may not cause objection. For A = A*, it may 
suggest that the symmetric matrix in (9) would be preferred. We show by extensive numerical 
experiments that for both the symmetric and unsymmetric cases, if direct methods are used to 
solve (27) and (9), then the cheaper (27) performs as well as (9). We point out that if iterative 
methods are used to solve (27) and the solution at each step is orthogonalized to x, then (27) 
is the same as the original Jacobi-Davidson method. The excellent numerical behavior of direct 
methods for (27) suggests that in iterative methods, the orthogonalization of the solution to x 
may be carried out after several iteration steps instead of after each step. 

One advantage of Jacobi-Davidson over IIGD is that IIGD requires two system-solves per 
inner iteration, which is more expensive than the one system-solve Jacobi-Davidson. 

Also from (23) in the proof of Theorem 4.1, we see that in order to capture the fast conver- 
gence property of RQI and to move t away from the x direction, the r/ should not be too small. 
And (23) suggests the following simplified IIGD, which requires only one system-solve: 

5(a) Solve (approximately) for t from: 

(A - AI) t = X, (28) 

Compute the constant r/ = l/(x*t), then assign t ^ (qt — x). 

This modification is simple. Other researchers may have noticed similar equivalent formulas 
several years ago, but to our knowledge, to explicitly propose solving the IIGD equation by this 
one-solve approach is new, and no numerical results seem have been published that compare the 
one-solve approach with the mainly used two-solve approach. Our modification actually saves 
almost half the computational cost at each inner iteration of IIGD. Even with one less system- 
solve, it produces a t that is readily orthogonal to x (even when (A — AI)~^x makes a small angle 
with X. This is an improvement on the observation 4.1.(c) in [32]). Both the above analysis 
and the extensive numerical experiments that we performed show that (28) works as well as 
the IIGD. Because of the reduced (often near-singular) system solves, (28) can be expected to 
perform better than (10). Most important, it is the least expensive method but equally efficient 
to Jacobi-Davidson and IIGD. 

We note that existing preconditioned methods for linear systems may be easily incorporated 
into (27) and (28). We use (28) as an example: If at step j we have a good preconditioner Kj 
for (A — Ajl), then we solve Kjt = Xj and compute r/ = -L. Then the updating vector can be 

obtained by (r/t — Xj ) , which is always orthogonal to Xj — the direction we are interested in to 
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augment the projection basis. The significance of (28) may be seen from the following: If we stick 
to the preconditioned form of (10) (i.e., Kjt = —rj + r/Xj), then two preconditioned equations 

would have to be solved for the orthogonalization constant rj {rj = ijJ-i ^ ) since KJ rj may 

not be equal to xj. But our modified scheme (28) aptly reduced the two system-solves into one 
system-solve, even for the preconditioned formulation. 

6 Numerical Results and Discussions 

The purpose of this paper is not to present actual implementations of solvers for large scale 
eigenvalue problems. Our purpose is to study the efficiency of different correction equations. 
Different preconditioning techniques may be applied readily after the correction equations are 
derived. Currently one of the most efficient and well known eigensolvers is the Jacobi-Davidson 
method (9). IIGD (10) is equivalent to Jacobi-Davidson and is well known in the computational 
chemistry community. We propose the generalized form (21). In this section we present nu- 
merical behavior of the more specific forms (16), (17), (27) and (28), with comparisons among 
themselves and to the well known (9) and (10). In the result-plot Figures, Nl and N2 denote 
(16) and (17), respectively; JD and IIGD denote (9) and (10), respectively; and JDm and IIGDm 
denote the modified forms (27) and (28), respectively. 

The numerical experiments were done by using Matlab v6.5 on Pentium IV PCs with OS 
Linux Redhat7.3. The models are restricted to relatively small scale problems (200 < n < 1000). 
We use direct methods (Matlab "\") to solve the correction equations, since for matrices of this 
scale "\" is often at least ten times faster and more accurate than iterative methods. We note 
that when M is singular, M\r give the solution M"''r where is the pseudo-inverse of M. 

As pointed out in Section 5, in Algorithm 2.1, the solution t from step 5(a) is made orthogonal 
to x in step 5(b). Hence, for the Jacobi-Davidson type equations (9), (16), (17) and (27), we did 
not additionally require t _L x in step 5(a). This modification will tell which equation would 
naturally provide a solution that can augment the subspace efficiently. 

We tried the methods on models from the Matrix Market.^ Models from this website are 
often related to realistic and difficult problems. For all the tests we chose relative tolerance 
tol = 10"^'' as the stopping measure for the residual norm. We report results for the mod- 
els: bwm200, ck400, pde225, pde900, rdb200, rdb450, rdb968, rw496. The name of each 
model is listed in the first term at the title of each subgraph. These models may be retrieved 
easily by searching for eigenvalue problems in Matlab format at the Matrix Market website. 
The physical meaning of these models may be found at the same website. The mode denotes 
the eigenvalue selection criteria, where LR, LM, SR and SM stand for largest real part, largest 
magnitude, smallest real part and smallest magnitude, respectively. The modes were chosen 
according to the description of each model. Since there are fewer symmetric standard eigen- 
value problems in Matlab format at Matrix Market, we also applied the methods to matrices 
(A -|- A*)/2 when A ^ A*. We note that this simple symmetrization still led to problems that 
are more realistic (and usually more difficult) than randomly constructed models. 

The methods discussed in this paper generally converge fast locally, but not globally. For 

^http: //math.nist . gov/MatrixMarket/ 
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the problems from Matrix Market, without a good initial vector, each method may take a long 
time to arrive at the fast convergent region. Hence, we chose the initial vector as follows. 
We computed the eigenvector corresponding to the required eigenvalue by the Matlab function 
eig and perturbed it as the initial vector. For the nonsymmetric problems, we perturbed the 
eigenvector by 10~^ * rand{n, 1); for symmetric problems, the perturbation was set as 5 * 10"^ * 
rand{n, 1). By choosing the initial vector this way, the locally fast convergence behavior of each 
method becomes clear to observe and easy to compare. Certainly in real applications it is not 
easy to find a good initial guess without a good understanding of the models at hand, and one 
may have to resort to restart techniques. The approach we took in this section was mainly 
to verify the analysis we did in this paper, as well as to check the viability of the proposed 
formulations. 

As seen from the numerical results, the Jacobi-Davidson, IIDG and their simplified versions 

(27) (28) do show locally quadratic convergence for nonsymmetric problems and cubic conver- 
gence for symmetric problems. And the convergence appeared faster than the theoretic rate in 
some cases, which is due to the subspace acceleration. For all the problems tried, JDm and 
IIGDm behaved almost the same as Jacobi-Davidson and IIGD; as seen from the plots, the four 
lines are often indistinguishable. This result also agrees with our analysis in Sections 4 and 5. 

What needs to be explained more is the behavior of (16) and (17). Although for some 
constructed models (16) and (17) appeared to be as competitive as JD and IIGD, for most of 
the more realistic models they performed more slowly. The observed convergence behavior for 
(16) and (17) is locally quadratic for symmetric problems and linear for nonsymmetric problems. 
It may be super-quadratic or super-linear for each case because of the subspace acceleration. 
The reason is that the derivation of the gradient and Hessian in Section 3 holds true for A = A*, 
whereas the gradient for the Rayleigh quotient of A 7^ A* is different and may not exist. We 
point out that even though in the Matlab codes we did not require t _L x at step 5(a) for (16) 
and (17), adding this requirement may not help much if the correction equations are solved by 
direct methods. The reason is discussed in Section 5. If this requirement is enforced at each step 
of an iterative method for (16) and (17), then (16) and (17) essentially become Jacobi-Davidson 
and IIGD, as seen from the proof of Theorem 4.1. 

The Jacobi-Davidson equation (9) together with equations (16) and (17) led to the gener- 
alized formulation (21), which led to the unified theory and the two simplifications (27) and 

(28) . The modifications are cheaper and simpler than the original formulations. Analysis and 
numerical evidences show that they are as efficient as the original formulations. We expect that 
preconditioned formulations based on (27) and (28) will be practical in actual applications. 

The numerical results we present here evidently show that a = 1 in (21) is fundamental for 
the efficiency of the methods we discussed in this paper. We hope that they also contribute to 
the understanding and appreciation of the Jacobi-Davidson method and the IIGD method. 

One problem related to Jacobi-Davidson and IIGD type methods is the possible slow conver- 
gence when a starting vector is poor. Our unreported numerical tests show that with a random 
initial vector many iterations may be needed to reach the fast convergent region (some of the 
numerical results in [16] showed similar behavior). This suggests that efficient globalization 
techniques may need to be developed. A practical approach is to apply a few steps of cheaper 
Lanczos or Arnoldi iteration to get a better starting vector and then integrate restart during the 
Davidson type outer iteration. Another approach is to modify shifts in the correction equations. 
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This may require a better understanding on the difference between preconditioning for hnear 
systems and preconditioning for eigenvalue problems. [38] contains very interesting results in 
this direction for the symmetric eigenvalue problem. 




Figure 1: Comparison of performance for nonsymmetric models: bwm200, ck400, pde225, 
pde900. 



7 Concluding remarks 

In this paper we highlighted the value of subspace methods in eigenvalue computation. We 
also emphasized the link between eigenvalue computation and optimization through the Newton 
method. We derived a class of Newton updates for symmetric eigenvalue problems. From the 
study of the Newton updates and their similarity to the Jacobi-Davidson method, we discovered 
the generalized correction equation (21). From this generalized formula we arrived at simplified 
forms of the Jacobi-Davidson method and the IIGD method. The modification (28) actually 
halves the computational cost for the inner iteration of IIGD. We showed by extensive numerical 
results that the simplified versions perform as efficiently as the Jacobi-Davidson and IIGD meth- 
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Figure 2: Comparison of performance for nonsymmetric models: rdb968, rw496, rdb200, rdb450. 

ods. Preconditioned forms based on (27) and (28) are expected to be efficient and practical for 
large scale eigenvalue problems from real-world applications; this will be the subject of ongoing 
research. 
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